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Abstract. A framework to analyze inference performance in densely connected single-layer 
feed-forward networks is developed for situations where a given data set is composed of correlated 
patterns. The framework is based on the assumption that the left and right singular value bases 
of the given pattern matrix are generated independently and uniformly from Haar measures. 
This assumption makes it possible to characterize the objective system by a single function of 
two variables which is determined by the eigenvalue spectrum of the cross-correlation matrix of 
the pattern matrix. Links to existing methods for analysis of perceptron learning and Gaussian 
linear vector channels and an application to a simple but nontrivial problem are also shown. 



1. Introduction 

Inference from data is one of the most significant problems in information science, and 
perceptrons (or single-layer feed-forward networks) are often included in widely-used devices 
for solution of this problem. In the general scenario, for a given N dimensional input pattern 
X = (xi, X2, . . . , xtv)"'", such a network returns an output y, which may be a continuous/discrete 
single/multidimensional variable, following a conditional probability distribution P(y\x;w) = 
P{y\A), where T denotes the matrix transpose, w = {wi,W2, ■ ■ ■ ,W]y)'^ denotes the weight 
parameter of the perceptron and A = N^^^'^w ■ x. The scale factor N^^^"^ is introduced to 
ensure that the components of w and x are typically of 0(1) as the limit of ^ oo. Given a 
data set = {{xi,yi), {x2, 2/2), • • • , {xp, yp)}, the Bayes formula 

provides us with a useful basis for constructing the optimal inference, which may for example 
involve estimation of the parameter w, or prediction of outputs for novel input patterns. 
Here P{w) is a certain prior distribution of w, = N^^^'^w ■ x^ (fi = 1,2, ... ,p) and the 
normalization factor Zp(^^) = Tr^y P(iu) J^^^-^ P(y^| A^) serves as a partition function, where 
Ttw denotes summation (or integration) over all possible states of w. 

In general, equation ([T]) can be regarded as the canonical distribution of a virtual spin system 
which is subject to random interactions. This similarity has motivated cross-disciplinary research 
across the fields of statistical mechanics and neural information processing over the last two 
decades, which has led to the discovery of various complex behaviors in the learning processes of 



neural networks EJ [3] and to the development of families of advanced mean field approximation 
algorithms that practically overcome the intrinsic computational difficulties underlying inference 
in large networks [U \5\ . 

More recently, inference in the style of equation ([T]) is also being researched actively in 
another context; namely, in the study of linear vector channels for wireless communication. 
In this context, multiple information symbols denoted by w are simultaneously transmitted 
through a single channel, linearly transformed into = N~'^/'^x^ • to (// = 1, 2, . . . At 
the receiver's terminal, the transmitted symbols w have to be estimated from the received 
signals = 1, 2, . . . Under the assumption that the channel and the prior distribution of 

information symbols are modeled as n]^=i P{y^J\^^l) ^'^^ P{w), respectively, equation ([T]) allows 
the optimal demodulation scheme. The similarity between problems of inference and disordered 
spin systems again serves to potentiate nontrivial performance analysis [6i [3 [HI El EHl [HI [121 [13] 
and development of advanced approximate demodulation algorithms [HI [151 [13 [13 [E] for large 
systems. 

Although statistical mechanical schemes have been applied successfully to various inference 
problems of the form of equation ([T|) in such ways, there still remain several research directions 
to explore. Investigation of inference from correlated patterns is a typical example of such a 
problem. For theoretical simplicity, most existing research on perceptron learning is based on 
the assumption that the input vectors are independently generated from an isotropic distribution 
[Tl[2l[3]. However, it is obvious that real world data is usually somewhat biased and correlated 
across components, which makes it difficult to utilize the developed schemes directly for data 
analysis beyond a conceptual level. Exploration of correlated patterns is also important in 
the study of linear vector channels because the matrix entries of the linear transformation are 
generally correlated with each other due to spatial proximity of antennas and for optimizing 
communication performance [191 120j . Recently, the author and his colleagues have developed 
a framework to handle such situations based on a formula of random matrix theory [211 122j . 
However, the scheme we have developed is still not fully satisfactory because it is applicable 
only to Gaussian channels. In order to deal with more general situations, further development 
is required. 

The purpose of this article is to provide such a development. More precisely, we will develop 
a framework to analyze inference offered by equation ([T|) when entries of the pattern matrix 



are correlated. A similar direction has already been followed by Opper and Winther j23[ l24l[^ . 
However, their formalism, developed for densely connected networks of two-body interactions, 
is highly general, and therefore properties that hold specifically for models satisfying equation 
([T|) are not fully utilized. Hence we develop here a specific formalism for analyzing inference 
problems expressed by means of equation ([T|). 

This article is organized as follows. In section 2, models that we will investigate are 
introduced. For characterizing correlated patterns, we assume that the pattern matrix (|2|) 
is randomly generated under the constraint that singular values of the matrix obey a given 
distribution. Section 3 is the main part of this article, in which two analytical schemes are 
developed. One takes as its basis the replica method [26], which makes it possible to assess the 
typical inference performance of the objective system by averaging the pattern matrix X with 
respect to an assumed distribution. The other is developed for approximately evaluating the 
average of w with respect to equation ([T]) for a given specific X (or ,^*'), which corresponds to the 
Thouless- Anderson-Palmer approach [27] in spin glass research. It is shown that a two-variable 
function, which we denote by F{x, y) and which is determined by the eigenvalue spectrum of the 
cross-correlation matrix X^X and the pattern ratio a = p/N, plays an important role in both 
schemes. Links to existing methods of analysis of the schemes that we develop are indicated 




(2) 



in section 5 in conjunction with an application to a simple example problem. The final section 
contains a summary. 



2. Model definition 

An expression of the singular value decomposition 

X = U^DV, (3) 

of the pattern matrix X is the basis of our framework, where D = diag(dfc) is a, p x N 
diagonal matrix, and U and V are p x p and N x N orthogonal matrices, respectively. 
Linear algebra guarantees that an arbitrary p x N matrix X can be decomposed according 
to equation ([3]). The singular values of X , dk {k = 1,2, . . . , min(p, A^)), are linked to eigenvalues 
of the cross correlation X^X, {k = 1,2,...,N), as = d\ {k = 1, 2, . . . , min(p, A^)) 
and otherwise, where min(p, A^) denotes the lesser value of p and A^. In order to handle 
correlations in X analytically, we assume that the orthogonal matrices U and V are uniformly 
and independently generated from the Haar measures ol p xp and N x N orthogonal matrices, 
respectively, and that the empirical eigenvalue spectrum of X'^X, N^'^Ylik=i^i^ ~ '^k) = 
(1 - min(p, Ar)/iV),5(A) + A^-^ ES"/''''^^ " 4), 

converges to a certain distribution p{X) as A^ 
and p tend to infinity with keeping a = p/N of the order of unity. 

For generality, we assume that the outputs y = {yi,y2, ■ ■ ■ , Up)^ for X are generated from a 
generative model 

Q{y\X) = TrQ{w) J] Q{y,^\^,) = ZgiC), (4) 

where the prior and conditional probabilities of this model, Q{w) and Q{y\/S), may differ from 
those of the recognition model, P{w) and P(y|A), which is used in equation ([T]). For a fixed 
data set = {{xi,yi), (2:2,2/2), • • • , {xp,yp)] = {X,y), Q{y\X) = Zq{^p) serves as the partition 
function of the correct posterior distribution of w, Q{w\SJ') = Q{w)Y\''^^^Q{yy\/S.^) / Zq{SJ'). 
For analytical tractability, we also assume that both the prior distributions of the generative 
and recognition models can be factorized as Q{w) = Wf=iQ{wi) and P{w) = W^=iP{wi), 
respectively. 

3. Analysis 

3.1. Analysis of the generative model and the F -function 

We first analyze properties of the generative model since outputs y of the data set are 
generated by this model following equation For this purpose, we introduce an expression 

- N-^l^w ■ x^^ 
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where i = u = {ui,U2,... ,Up)'^ and Qy^{u^) = f dA^exp [-iii^A^] A^)/(27r). Next, 

we substitute equation ([3]) into equation ([5]) and take an average with respect to the orthogonal 



matrices U and V. For this evaluation, it is noteworthy that for fixed sets of dynamical variables 
w and u, w = Vw and u = Uu behave as continuous random variables which are uniformly 
generated under the strict constraints 
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when U and V are independently and uniformly generated from the Haar measures. In the limit 
as A^,p — > oo with keeping a = p/N 0(1), this yields an expression 



(8) 



where • • • denotes averaging with respect to the Haar measures, the function F{x,y) is defined 
as 
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and (• • ■)p indicates averaging with respect to the asymptotic eigenvalue spectrum of X'^X, /5(A). 
Fixtrg {• • •} represents extremization with respect to 9, which corresponds to the saddle point 
assessment of a complex integral and therefore does not necessarily mean operation of minimum 
or maximum. Expressions analogous to equations ([8|) and ([9|) are known as the Itzykson-Zuber 
integral or G-function for ensembles of square (symmetric) matrices [28^ [29| l30l I31j . These 
equations imply that the annealed average of equation ([5]) is evaluated as 
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where 



Aw{Tw) = Extr 



T T 



+ In 



Trg(u>)exp 



2 

—w 



(11) 



Extr 

fn 



T T 



+ In 



Tr Qy{u) exp 



Tu 



-u 



(12) 



Normalization constraints Tr^^ Q{y\A) = 1 guarantee that TiyZQ{(^P) = 1, which, in conjunction 
with equations ([TU]) . ([TT]) and (fT^ . implies that = Tvww'^Q{w), Tu = 0, = 
and Tu = a~^T^ The physical implication is that, due to the central limit theorem. 



(Ai, A2, . . . , Ap)""" follows an isotropic Gaussian distribution 
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in the limit as N,p ^ 00, a = p/A^ ~ 0(1) when w is generated from Q{w) = W^^i Q{wi), and 
U and y are independently and uniformly generated from the Haar measures. 



3.2. Replica analysis 

Now, we are ready to analyze equation ([T]). As is a set of predetermined random variables 
depending on X and the generative model ([5]), we utilize the replica method. This means that 
we evaluate the n-th moments of the partition function Zp(^^) for natural numbers n G N as 



[Zp{e)\. = TrQ{y\X)Z^p{iv) = TTZQ{iv)Z^p{iv) 
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lim^^o ^ 111 [■^p(^^)]fP) analytically continuing expressions obtained 
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TTyQ{y\X){. 



and assess the quenched average of free energy with respect to the data set as 
N-^ [In Zp{e)k, 

for equation ()14p from n G N to real numbers n S 

Try Zq{^p){- ■ ■) represents the average with respect to the data set 
/ dA^ exp [— iu^A^] P(y^|A^)/(27r). {w""} and {w"} represent sets of dynamical variables 
w^, w^, . . . , and u^, u^, . . . , u"', respectively, where the replica indices and 1,2, ... ,n denote 
the generative and n replicas of recognition models, respectively. 

For this procedure, a note similar to that for the evaluation of equation dS]) is useful. Namely, 
for fixed sets of dynamical variables {u""} and {w^}, u"' = Uu"" and w'^ = Vw"" behave as 
continuous random variables which satisfy strict constraints 
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(a, 6 = 0, 1, . . . , n) when U and V are independently and uniformly generated from the Haar 
measures. This indicates that equation (fM|) can be evaluated by the saddle point method with 
respect to sets of macroscopic parameters Qw = {q^) Q„ = {q'^) in the limit as N,p ^ oo, 
a = p/N ~ 0(1)- In addition, intrinsic permutation symmetry among replicas indicates that it 
is natural to assume that (n + 1) x (n + 1) matrices and Q„ are of the form 
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at the saddle point. Here, E = (cq, ei, . . . , e„) denotes an n + 1-dimensional orthonormal basis 
composed of Cq = (1, 0, 0, . . . , 0)^, ei = (0, n"^''^, n~^/^, . . . , n""'^/^)'^ and n — 1 orthonormal 
vectors 62, £3, . . . , e„, which are orthogonal to both bq and ei. Rather laborious but 
straightforward calculation on the basis of expressions ([T7|) and (fTHj) yields 
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where T^ = Tr^ 'w'^Q{w). This equation and evaluation of the volumes of dynamical variables 
{w""} and under constraints (llSp and ()16p of the replica symmetric (RS) ansatz ()17p and 
(lisp provide an expression for the average free energy 
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and 
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Here, Tu = a ^Tuj{X)p and Ds = dsexp [— s^/2] /-v/27r represents the Gaussian measure. 
Expressions ()19p ~ ()22p are the main results of this article. 



Two points are noteworthy here. The first is that a set of parameters determined by the 
extremizing equation ()20p represents typical macroscopic averages of the posterior distribution 
([T]), by which various performance measures can be evaluated [2]. Moreover, equation (I20p itself 
is linked to information theoretic measures for assessing inference performance. For example, 
the Kullback-Leibler divergence (per output) between the generative and recognition models, 
which represents a certain distance from the generative model and is related to the prediction 
ability of the recognition model for novel data, is evaluated as 



KL(Q|P) = iTrQ(,|X)ln||^ 



-^[ln^Q(e-)],.--^[lnZp(e-)]^. 



(23) 



utilizing equation (f20l) [32] . Equation (f20ll . in conjunction with equation (fT3]l . can also be used 
for calculating the typical mutual information (per output) between the parameter w and the 
output y, which represents the information content of w that can be gained by observing the 
output y for typical pattern matrices X, as 



I{W;Y) = - Tr Q{w) 
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specific expressions of which, for problems of communication through additive channels, have 
been derived in earlier studies [HI [3 El [33]. The other issue is that the current formalism can 
be applied not only to the RS analysis presented above but also to that of replica symmetry 
breaking (RSB) [34j- Analysis of the local instability condition of the RS solution ([T7|) and ([T8]) 
subject to infinitesimal perturbation of the form of the one step RSB yields 
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where 
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and 



= Tr 1 DzDxQ (^y\^f^-^ 



X H -=z 



In 



DxP y\\/XuX + \/quZ 



(27) 



Equation ()25|) corresponds to the de Almeida-Thouless (AT) condition for the current system 

Eg. 



3.3. The Thouless- Anderson- Palmer approach 

The scheme developed so far can be used for macroscopically characterizing the inference 
performance of equation ([T]) for typical samples of However, another method is necessary to 



evaluate microscopic averages for an individual sample of The Thouless- Anderson-Palmer 
(TAP) approach [27] known in spin glass research offers a useful guideline for this purpose. 
Although several formalisms are known for this approximation scheme [1], we here follow the 
one based on the Gibbs free energy because of its generality and wide applicability [25] [30] . 
Let us suppose a situation for which the microscopic averages of the dynamical variables 
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are required. The Gibbs free energy 

$(m^„,m„)= Extr {/i^ • + fi„ • m„ - In [Zp(/i^„,/i„)]} , (30) 

where 

p N 
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offers a useful basis for this objective as the extremization conditions of equation (|3U|) generally 
agree with equations (|28|) and (j29|) . This indicates that one can evaluate the microscopic averages 
(p8]) and (p9|) by extremization once the function of Gibbs free energy (pOj) is provided. 

Unfortunately, exact evaluation of equation (j30p is computationally difficult and therefore we 
resort to approximation. For this purpose, we put a parameter / in front of X in equation (jSip . 
which yields the generalized Gibbs free energy as 

$(m^, m„; /) = Extr {h^j • m^„ + hu • m„ - In [Zp{hu,, hu, I)]} , (32) 

where Zp{hu,, hu, I) = Tvu^w 11^=1 Py^i^p.) UiLi -P(w^i) exp [huj ■ w + hu ■ (iu) + {iu)^{lX)w] . 
This implies that the correct free energy (I30p can be obtained as <l>(m^„, rriu) = <I>(m^, m^; / = 1) 
by setting Z = 1 in the generalized expression ()32p . One scheme to make use of this relation 
is to perform the Taylor expansion around / = 0, for which $(m^,mu;0 can be analytically 
calculated as an exceptional case, and substitute I = 1 in the expression obtained, which is 
sometimes referred to as the Plefka expansion [35]. However, evaluation of higher order terms, 
which are not negligible for correlated patterns in general, requires a complicated calculation in 
this expansion, which sometimes prevents the scheme from being practically tractable. In order 
to avoid this difficulty, we take an alternative approach here, which is inspired by a derivative 
of equation ([32|) 

5$(m^,m„;0 , ^ > 

Ol = -{{^u) Xw)^, (33) 

where (•••); represents the average with respect to the generalized weight 05^=1 -^^^(^m)^ 
HiLi P{wi)x exp [huj ■ w + hu ■ (in) + (iu)'^ {lX)w~\ , /i^ and hu of which are determined so as 



to satisfy {w)^ = ruyj and ((iw)); = Tnu, respectively [25]. The right hand side of this equation 
is an average of a quadratic form containing many random variables. The central limit theorem 
implies that such an average does not depend on details of the objective distribution but is 
determined only by the values of the first and second moments. In order to construct a simple 
approximation scheme, let us assume that the second moments are characterized macroscopically 
by — I {w)j^ p = Nxw and — | {u)^ p = pXu- Evaluating the right hand side 

of equation (|33p using a Gaussian distribution for which the first and second moments are 
constrained as {w)i = rriu,, ((iu)); = mu, — I {w)i p = Nxw and (I^^P); — | {u)i p = pXu, 

and integrating from Z = to / = 1 yields 

0) ~ -mlXm^ - NF{xw,Xu), (34) 

where the function F{x, y) is provided as in equation ([9]) by the eigenvalue spectrum of X, 
p{\) = AT-i ^^^^ S{X - Afc) and the macroscopic second moments Xw and Xu are included in 
arguments of the Gibbs free energy as the right hand side of equation (|33|) depends on them. 
Utilizing this and evaluating ^{xw,Xu,'m'w,'n^u]0), which is not computationally difficult since 
interaction terms are not included, yields an approximation of the Gibbs free energy as 
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which is a general expression of the TAP free energy of the current objective system ([T]). 
Extremization of this equation yields a set of TAP equations 
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solutions of which represent approximate values of the first and second moments of 
the distribution ([T]) for a fixed sample of X (or ^p). —2{d/dxw)F{xwiXu)i^w and 



{2/ a){d / dxu)F{xw,Xu)''^u in equations (jiO]) and (jl2]) are generally referred to as the Onsager 
reaction terms. Counterparts of these equations for systems of two-body interactions have been 
presented in an earlier article (SO]- Although we have assumed single macroscopic constraints 
as characterizing the second moments, the current formalism can be generalized to include 
component- wise multiple constraints for constructing more accurate approximations, which 
leads to the adaptive TAP approach or, more generally, the expectation consistent approximate 
schemes developed by Opper and Winther 



4. Examples 

^.1. Patterns of independently and identically distributed entries 

In order to investigate the relationship with existing results, let us first employ the developed 
methodologies to the case in which the entries of X are independently drawn from an identical 
distribution with zero mean and variance A^~^. This case is characterized by an eigenvalue 
spectrum of Marcenko-Pastur type, p{X) = [1 - a]+5{\) + (27r)-U"V[^ " A_]+[A+ - A]+, 
where [x\^ = x for x > and 0, otherwise, and \± = ± 1)^ [20], which yields 

F{x,y) = -'^xy. (44) 

This together with the relation (A)^ = a, which holds for the current eigenvalue spectrum, 
implies that equation (fT9|) can be expressed as 



■Ao{Xw,Xu,qw,qu,mu„mu) = --{XwXu + QwXu - QuXw - 2m^m„). (45) 

Inserting this into equation (j20|) and then performing an extremization with respect to Xu, Qu 
and niu yields 

Xu = Xw, Qu = qw, rhu = ruu], (46) 

where Xu, Qu and m„ are the variational variables used in equation ()22p . This implies that the 
replica symmetric free energy ()20p can be expressed as 
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where the relation = a^^ (A)^ was utilized. This is equivalent to the general expression 
of the replica symmetric free energy of a single layer perceptron for pattern matrices with 
independently and identically distributed entries [21 137] . 

4-2. Gaussian linear vector channel 

The second example to show equivalent results to those obtained by earlier analysis is 
that of a Gaussian linear vector channel, which is characterized by P{y\A) = (27rcj^)^^/^ 
exp [-{y - A) 2/(20-2)] and Q{y\A) = (27rcrg)-V2 _ A)2/(2cr§)] . In this case, equation 

(p2|) is evaluated as 

■AuiXu,qu,mu) = ^ (^'^^ - ^" ~ ^ ('^^^'^ - lnx« - l) - ^Xu{Tu + crl), (48) 



while requiring that rriu/Xu = 1- Further, extremization with respect to qu in equation ()20p 
indicates that A^^, which is the counterpart of Ay in equation ^ ioY y = Xu, is set to 
a constant value A-,^.^ = cj^, which implies that {d/dxu)F{X'w,Xu) = (l/2)(o"^ — XZ^) ^^'^ 
Xu = cr-'^ + '^iacr'^)~^Xw{d/dxw)F{xw,Xu) hold. These, in conjunction with aTu = {X) p, 
indicate that equation ([20l) can be expressed as 

^[lnZp{^P)],p = Extr {Aw{Xw,qw,mw) 

(-^) + {- ^'-'y^O- + G' - I (.n(2,.^) + f) , (49) 

where 

G{x) = Extr |-i (ln(A - A))^ + - ^ Inx - i, (50) 

is referred to as the Itzykson-Zuber integral or G-function in physics literature 129 ^ 1301 131j . 
which is linked to the i?-transform of the cross-correlation matrix X"^ X used in free probability 
theory [201 EEl ES] • Equation PU]) is equivalent to the expression for the replica symmetric free 
energy for Gaussian linear vector channels of a correlated channel matrix recently provided by 
the author and his colleagues [HI [22] • 



4-3. Ability of the Ising perceptron to separate random orthogonal patterns 
In order to demonstrate the utility of the methodologies we have developed, as our final example 
we take up a simple but nontrivial problem concerning the separation ability of the Ising 
perceptron. Let us consider a simple perceptron of binary weight w = {+1, —1}^, P{y\/X) = 1 
for 2/A > and 0, otherwise, where y = ±1. It is known that, in typical cases, this network can 
correctly separate a set of random patterns = {{xi,yi), {x2, y2), ■ ■ ■ , {xp, yp)} = (X, y) up to 
ac — 0.833, when the elements of a;^ are independently generated from an isotropic distribution 
and the elements of = ±1 are independently and randomly assigned with a probability of one 
half for fj, = 1,2, ... ,p ^\ HU |l2]. Our question here is how Oc is modified when the pattern 
matrix X is generated randomly in such a way that the patterns a;^ are orthogonal to each 
other. In order to answer this question, we employ the replica and TAP methods developed in 
preceding sections for p(A) = (1 — a)5{X) + a6{X — 1), which represents the eigenvalue spectrum 
of the random orthogonal patterns, assuming < a < 1. Figure [T] shows how the entropy of 
w depends on the pattern ratio a. The curve indicates the theoretical prediction of the replica 
analysis while the markers denote the averages of entropy obtained by the TAP method over 
100 samples for N = 500 systems. The error bars are smaller than the markers. Solutions of the 
TAP method are obtained by a method of iterative substitution, details of which are reported 
elsewhere [l3]. Although the curve and the markers exhibit excellent agreement for the data 
points Q = 0.1,0.2, . . . ,0.8, we were not able to obtain a reliable result for a = 0.9, at which 
the iterative scheme does not converge in most cases even after 1000 iterations. This may be 
a consequence of RSB since the replica analysis indicates that the AT stability is broken at 
QAT — 0.810. Therefore Oc — 0.940 indicated by the condition of vanishing entropy is to be 
regarded not as the exact but as an approximate value provided by the unstable RS solution. 
However, extrapolation from the results of direct numerical experiments for finite size systems 
indicates that ac — 0.938 [43j, which implies that the effect of RSB is not significant for the 
evaluation of Qc in this particular case. 



5. Summary 

We have developed a framework for analyzing the inference performance of densely-connected 
single-layer networks, typical examples of which are perceptrons and models of linear vector 



0.7 



+ 



" 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 

a 

Figure 1. Entropy of w (per element) versus the pattern ratio a. For details, see the main 
text. 

channels. The development is intended for dealing with correlated patterns. For this purpose, 
we have developed two methodologies based on the replica method and the Thouless- Anderson- 
Palmer approach, which are standard tools from the statistical mechanics of disordered systems, 
introducing a certain random assumption about the singular value decomposition of the pattern 
matrix. The validity and utility of the developed schemes are shown for two existing results and 
a novel problem. 

Investigation of the properties of algorithms for solving the TAP equations (|36|) - (j43p 
[m [T71 dH] and variants of them [T^l dSl \Ml , as well as application of the developed framework 
to real world data analysis [121 HB] and various channel models [121 [20] , are promising topics for 
future research. 
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